Risk-Variant Policy Switching to Exceed Reward Thresholds

نویسندگان

  • Breelyn Melissa Kane
  • Reid Simmons
چکیده

This paper presents a decision-theoretic planning approach for probabilistic environments where the agent’s goal is to win, which we model as maximizing the probability of being above a given reward threshold. In competitive domains, second is as good as last, and it is often desirable to take risks if one is in danger of losing, even if the risk does not pay off very often. Our algorithm maximizes the probability of being above a particular reward threshold by dynamically switching between a suite of policies, each of which encodes a different level of risk. This method does not explicitly encode time or reward into the state space, and decides when to switch between policies during each execution step. We compare a risk-neutral policy to switching among different risk-sensitive policies, and show that our approach improves the agent’s probability of winning.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimation of New Weighted Controlled Switching Overvoltage by RBFN Model

Mitigating switching overvoltages (SOVs) and conducting well-suited insulation coordination for handling stresses are very important in UHV transmission Lines. The best strategy in the absence of arresters is controlled switching (CS). Although elaborate works on electromagnetic transients are considered in the process of designing transmission systems, such works are not prevalent in day-to-da...

متن کامل

Asymmetric Effects of Monetary Policy and Business Cycles in Iran using Markov-switching Models

This paper investigates the asymmetric effects of monetary policy on economic growth over business cycles in Iran. Estimating the models using the Hamilton (1989) Markov-switching model and by employing the data for 1960-2012, the results well identify two regimes characterized as expansion and recession. Moreover, the results show that an expansionary monetary policy has a positive and statist...

متن کامل

Effects and costs of requiring child-restraint systems for young children traveling on commercial airplanes.

CONTEXT The US Federal Aviation Administration is planning a new regulation requiring children younger than 2 years to ride in approved child-restraint seats on airplanes. OBJECTIVES To estimate the annual number of child air crash deaths that might be prevented by the proposed regulation, the threshold proportion of families switching from air to car travel above which the risks of the polic...

متن کامل

On the Optimal Reward Function of the Continuous Time Multiarmed Bandit Problem

The optimal reward function associated with the so-called "multiarmed bandit problem" for general Markov-Feller processes is considered. It is shown that this optimal reward function has a simple expression (product form) in terms of individual stopping problems, without any smoothness properties of the optimal reward function neither for the global problem nor for the individual stopping probl...

متن کامل

Decision Making in the Reward and Punishment Variants of the Iowa Gambling Task: Evidence of “Foresight” or “Framing”?

Surface-level differences in the reward and punishment variants, specifically greater long-term decision making in the punishment variant of the Iowa Gambling Task (IGT) observed in previous studies led to the present comparison of long-term decision making in the two IGT variants (n = 320, male = 160). It was contended that risk aversion triggered by a positive frame of the reward variant and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012